Goto

Collaborating Authors

 nullnull 1




Appendix for " Fine-Grained Theoretical Analysis of Federated Zeroth-Order Optimization "

Neural Information Processing Systems

The main notations of this paper are summarized in Table 1. Table 1: Descriptions of the main notations used in this work.Notations Descriptions N, n the total number of clients and the total sample number of each client S, S We first introduce the lemmas which will be used in our proofs. Let e be the base of the natural logarithm. The stated result in Part (b) is proved. The optimization bound is given.







On sample complexity for covariance estimation via the unadjusted Langevin algorithm

Nakakita, Shogo

arXiv.org Machine Learning

We establish sample complexity guarantees for estimating the covariance matrix of strongly log-concave smooth distributions using the unadjusted Langevin algorithm (ULA). We quantitatively compare our complexity estimates on single-chain ULA with embarrassingly parallel ULA and derive that the sample complexity of the single-chain approach is smaller than that of embarrassingly parallel ULA by a logarithmic factor in the dimension and the reciprocal of the prescribed precision, with the difference arising from effective bias reduction through burn-in. The key technical contribution is a concentration bound for the sample covariance matrix around its expectation, derived via a log-Sobolev inequality for the joint distribution of ULA iterates.


Collaborative Compressors in Distributed Mean Estimation with Limited Communication Budget

Vardhan, Harsh, Mazumdar, Arya

arXiv.org Machine Learning

Distributed high dimensional mean estimation is a common aggregation routine used often in distributed optimization methods. Most of these applications call for a communication-constrained setting where vectors, whose mean is to be estimated, have to be compressed before sharing. One could independently encode and decode these to achieve compression, but that overlooks the fact that these vectors are often close to each other. To exploit these similarities, recently Suresh et al., 2022, Jhunjhunwala et al., 2021, Jiang et al, 2023, proposed multiple correlation-aware compression schemes. However, in most cases, the correlations have to be known for these schemes to work. Moreover, a theoretical analysis of graceful degradation of these correlation-aware compression schemes with increasing dissimilarity is limited to only the $\ell_2$-error in the literature. In this paper, we propose four different collaborative compression schemes that agnostically exploit the similarities among vectors in a distributed setting. Our schemes are all simple to implement and computationally efficient, while resulting in big savings in communication. The analysis of our proposed schemes show how the $\ell_2$, $\ell_\infty$ and cosine estimation error varies with the degree of similarity among vectors.